Processing and Normalizing Hashtags

نویسندگان

  • Thierry Declerck
  • Piroska Lendvai
چکیده

We present ongoing work in linguistic processing of hashtags in Twitter text, with the goal of supplying normalized hashtag content to be used in more complex natural language processing (NLP) tasks. Hashtags represent collectively shared topic designators with considerable surface variation that can hamper semantic interpretation. Our normalization scripts allow for the lexical consolidation and segmentation of hashtags, potentially leading to improved semantic classification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards the Representation of Hashtags in Linguistic Linked Open Data Format

A pilot study is reported on developing the basic Linguistic Linked Open Data (LLOD) infrastructure for hashtags from social media posts. Our goal is the encoding of linguistically and semantically enriched hashtags in a formally compact way using the machinereadable OntoLex model. Initial hashtag processing consists of data-driven decomposition of multi-element hashtags, the linking of spellin...

متن کامل

Twitter hashtags: Joint Translation and Clustering

The popularity of microblogging platforms, such as Twitter, renders them valuable real-time information resources for tracking various aspects of worldwide events, e.g., earthquakes, political elections, etc. Such events are usually characterized in microblog posts via the use of hashtags (#). As microbloggers come from different backgrounds, and express themselves in different languages, we wi...

متن کامل

Similarity measurement for describe user images in social media

Online social networks like Instagram are places for communication. Also, these media produce rich metadata which are useful for further analysis in many fields including health and cognitive science. Many researchers are using these metadata like hashtags, images, etc. to detect patterns of user activities. However, there are several serious ambiguities like how much reliable are these informa...

متن کامل

Impact of Feature Selection on Micro-Text Classification

Social media datasets – especially TwiŠer tweets – are popular in the €eld of text classi€cation. Tweets are a valuable source of microtext (sometimes referred to as “micro-blogs”), and have been studied in domains such as sentiment analysis, recommendation systems, spam detection, clustering, among others [6]. Tweets o‰en include keywords referred to as “Hashtags” that can be used as labels fo...

متن کامل

Twitter Hash Tag Recommendation

The rise in popularity of microblogging services like Twitter has led to increased use of content annotation strategies like the hashtag. Hashtags provide users with a tagging mechanism to help organize, group, and create visibility for their posts. This is a simple idea but can be challenging for the user in practice which leads to infrequent usage. In this paper, we will investigate various m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015